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FOREWORD 



Army training developers need tools to aid in the design, 
acquisition, and us^ of simulation- and computer-based programs 
of instruction for weapon operation and maintenance* One 
critical need is a job aid for the design and evaluation of 
training devices during all stages in the weapon acquisition 
cycle* 

This series of three reports describes one approach to such 
aiding — a hybrid of decision analysis and mathematical modeling. 
The approach provides numerical estimates of device effective- 
ness which are based on expert ratings of trainee and task 
characteristics, functional and physical similarity between 
the proposed device and the operational equipment, and the 
instructional characteristics of the device, it is an analytic, 
computer-based technique — a menu-driven system — which can be 
used at any stage of training device design. 

The product of this research can help training device 
procurers such as PM-TRADE and training developers in TRADOC 
make better documented decisions about training device design. 




EDGAR M. JOHNSON 
Technical Director 



V 



8 



ACKNOWLEDGMENTS 



The authors would like to acknowledge the assistance 
of the raters who contributed their time, effort, and 
brainpower: 

George R. Wheaton 
Daniel B. Felker 
Harris H. Shettel 
David L. Winter 
Basil MacDonald. 



vi 



9 



Forecasting Device Effectiveness: III* Analytic 
Assessment of DEFT 

EXECUTIVE SUMMARY 



Requirement: 

To analytically address the numeric and scalar proper- 
ties of the Device Effectiveness Forecasting Technique 
(DEFT); to conduct an examinatic^n of interraner agreement 
by analyzing three training devices. 

Procedure: 

Several analytic procedures were conducted to address 
various aspects of the scalar properties of DEFT, These 
procedures included Monte Ca'-lo simulations to assess the 
interpretation of DEFT outp* ./ sensitivity of DEFT para- 
metersr comparison of outpUwS/ stability, and interrater 
agreement. 

Findings: 

Results indicated that it v;ould be necessary to encor- 
porate assumptions regarding expected distributions of in- 
put variables in order to meaningfully interpret DEFT out- 
put. AlsOf the Monte Carlo analyses demonstrated the sen- 
sitivity of DEFT output scores to variations in inputs r and 
assessed the effects of various assumptions regarding 
measurement error on output scores. 

The interrater agreement issue was addressed by having 
several raters apply DEFT to three actual training devices. 
Results indicated a high degree of consistency among raters 
for all devices and for all levels of DEFT. 

Otilization of Findings: 

These findings indicate thatr with few modifications, 
DEFT can be used effectively and reliably to analytically 
evaluate training device-based training systems. 



vii 

10 



FORECASTING DEVICE EFFECTIVENESS: III. ANALYTIC 
ASSESSMENT OF DEFT 

CONTENTS 



Paae 



1. Incroduction 1 

2. Monte Carlo Analyses 4 

General Technical Approach to the Monte 

Carlo Analysis 4 

Interpretation of Output 6 

Sensitivity 13 

Comparison of Outputs 21 

Stability Analyses 25 

3. Interrater Agreement 42 

Method 43 

Results 53 

Summary 70 

References 72 



ERLC 



I 



LIST OF FIGURES AND TABLES 

Page 

FIGURES 

1. FREQUENCY DISTRIBUTION OF DEFT CONDITION 

3 DIFFERENCES 23 

2. DISTRIBUTION OF DEFT TOTAL DIFFERENCES FOR 

HYPOTHETICAL JUDGE (RELIABLE TO + 5) VERSUS 

"TRUTH" 7 36 

3. DISTRIBUTION OF DEFT TOTAL DIFFERENCES FOR 

HYPOTHETICAL JUDGE 2 (RELIABLE TO + 5) VERSUS 
"TRUTH" 7 37 

4. DISTRIBUTION OF DEFT TOTAL DIFFERENCES FOR* 

HYPOTHETICAL JUDGE 1 VERSUS JUDGE 2 (PERFECT 
AGREEMENT; BOTH RELIABLE TO + 5) 38 

5. DISTRIBUTION OF DEFT TOTAL DIFFERENCES FOR 

HYPOTHETICAL JUDGE 1 (RELIABLE TO + 10) VERSUS 
"TRUTH" 7 39 

6. DISTRIBUTION OF DEFT TOTAL DIFFERENCES FOR* 

HYPOTHETICAL JUDGE 2 (RELIABLE TO + 10) VERSUS 
"TRUTH" 7 40 

7. DISTRIBUTION OF DEFT TOTAL DIFFERENCES FOR 

HYPOTHETICAL JUDGE 1 VERSUS HYPOTHETICAL 

JUDGE 2 (PERFECT AGREEMENT; BOTH RELIABLE 

TO + 10) 41 

8. GUNNERY ENGAGEMENT *..*.*! 44 

TABLES 

I. CONDITION 1 RESULTS — UNIFORM INPUT, INITIAL 

RANGES AND COMBINATIONS 8 

2- CONDITION 2 RESULTS—UNIFORM INPUT: ALL RANGES 

1-100; INITIAL COMBINATION 8 

3. CONDITION 3 RESULTS— UNIFORM INPUT; ALL RANGES 

1-100; SQUARE ROOT TRANSFORMATION 8 

4 . CONDITION 4 RESULTS — TRUNCATED NORMAL INPUT; 

INITIAL DEFT COMBINATIONS 11 

5. CONDITION 5 RESULTS— TRUNCATED NORMAL INPUT; 

SQUARE ROOT TRANSFORMATION OF DENOMINATOR. . . 12 

6. DESCRIPTIVE STATISTICS FOR DEFT — FOR 

COMPARISON WITH SENSITIVITY ANALYSES .... 15 

7. SENSIVITY ANALYSIS FOR PD 16 

8. SENSITIVITY ANALYSIS FOR D 16 

9. SENSITIVITY ANALYSIS FOR R .* ] 17 

10. SENSITIVITY ANALYSIS FOP l^PD (PD' )....*.*! .* IS 

II. SENSITIVITY ANALYSIS FOR RD (D ')....!!! ! 18 

12. SENSITIVITY ANALYSIS FOR PS .* .* * 19 

13. SENSITIVITY ANALYSIS FOR FS * * 19 



ERIC 



X 

12 



LIST OF FIGURES AND TABLES (Continued) 



Page 

TABLES (Continued) 



14. SENSITIVITY ANALYSIS FOR RR (R') 20 

15. DESCRIPTIVE STATISTICS FOR DEFT CONDITION 3 

DIFFERENCE ANALYSIS 22 

16. PROBABILITY DISTRIBUTION OF DEFT CONDITION 

3 DIFFERENCES 24 

17. SCALE BIAS ANALYSIS FOR DEFT (UNIFORM INPUT 

DISTRIBUTIONS) 2 7 

18. HYPOTHETICAL "TRUE" RESULTS FOR DEFT 29 

19. RESULTS FOR HYPOTHETICAL JUDGE 1 — DEVIATION 

OF + 5 FROM "TRUTH" 30 

20. RESULTS FOR HYPOTHETICAL JUDGE 2 — DEVIATION 

OF + 5 FROM "TRUTH" 31 

21. RESULTS FOR HYPOTHETICAL JUDGE 1 — DEVIATION 

OF + 10 FROM "TRUTH" 32 

22. RESULTS FOR HYPOTHETICAL JUDGE 2 — DEVIATION 

OF + 10 FROM "TRUTH" 33 

2 3. DISTRIBUTIONS OF DEFT TOTAL DIFFERENCES 34 

24. MPS AND E-3A TASKS AND SUBTASKS 49 

25. DEFT INDEX VALUES: BOT 55 

26. DEFT INDEX VALUES: VIGS .*.*.* 56 

27. DEFT INDEX VALUES: MPS 57 

28. MEANS AND STANDARD DEVIATIONS OF PAIRED RATER 

COMPARISONS FOR EACH TRAINING DEVICE - 

DEFT I 64 

29. MEANS AND STANDARD DEVIATIONS OF PAIRED RATER* 

COMPARISONS FOR EACH TRAINING DEVICE - 

DEFT II 66 

30. MEANS AND STANDARD DEVIATIONS OF PAIRED RATER 

COMPARISONS FOR EACH TRAINING DEVICE - 

DEFT III 67 



ERIC 



^13 



!• Introduction 

This report is submitted in partial fulfillment of 
Contract MDA 903-ff2-C-0414 between the Army Research 
Institute (ARI) and the American Institutes for Research 
(AIR) • It is part of a progaramatic effort to develop and 
analytically evaluate a model designed to forecast training 
device effectiveness. Specif icallyr this report describes 
the analytic evaluation phase of the effort. 

Previous reports in this series have discussed issues 
related to the evaluation of a training system (Rose & 
Wheaton, 1984a) r and presented an analytic model (Rose & 
Wheaton^ 1984b). This models named the Device 
Effectiveness Forecasting Technique (DEFT) ^ incorporates 
numerous ratings and judgments regarding components of the 
training situation and the operational performance require- 
ment and generates forecasts of training device effective- 
ness. In lieu of empirical testSr Rose and Wheaton (1984a) 
outlined several analytic methods that could be employed to 
assess the adequacy of such a model. 

Decisions and Designs^ Inc. (DDI) and AIR employed 
five such aethods in the evaluation of DEFT: 

1 
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• Interpretation of output--what sorts of results can 
be expected from DEFT? 

• Sensitivity analysis — what is the impact on DEFT 
output of varying input parameter values? 

• Comparison of outputs--what do differences in 
scores received by various devices mean? 

• Stability- -what is the impact of disagreement be- 
tween raters on component scores? 

• Tnterrater agreement — applying DEFT to three train- 
ing devices, to what extent do raters agree for 
each of the various ratings and judgments? 

The first four of these questions were addressed using 
Monte Carlo analysis. The general approach was to simulate 
applications of the DEFT model by generating 5,000 random 
values (within the appropriate ranges) for each of the 
various DEFT inputs (Performance Deficit, Difficulty, 
etc.)* cind combining them according to the DEFT formulae, 
yielding 5,000 DEFT output scores. For the "interpretation 
of output" issue, this analysis, repeated under different 



*For details regarding the components of DEFT, combination 
rules, output variables, and rating procedures, see Rose & 
Wheaton, 1984 (b) . 



2 



conditionsr constituted the entire computational activity* 
Sensitivity analysis was performed using a variation on the 
basic analysis: Random values were generated for all but 
one of the input parameters; to examine the sensitivity of 
the output score to the value of the remaining input param- 
eterr this parameter was stepped through its range of 
values in an orderly fashion^ and output scores were com- 
puted for each of the values that it assumed. For "com- 
parison of outputs^" the basic analysis was performed twice 
to obtain two 5 r OOO-element vectors of output scores. One 
vector was subtracted from the otherr resulting in a vector 
of differences. A frequency distribution computed for this 
vector allows significance testing of difference values. 
Finallyr the impact of less than perfect interrater 
stability was explored by simulating "measurement error" 
and scale bias and examining their effects on the DEFT 
output. 

The basic procedure for assessing interrater agreement 
was to have six raters apply DEFT to three training 
devices. Model outputs were compared using various statis- 
tical techniques. This document presents the results of 
the five sets of analyses. Firsts we will present the 
general technical approach to the Monte Carlo analyses r 
followed by those results. We will then present the 
details of the interrater agreement study. 



t 
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2. Monte Carlo Analyses 

General Technical Approach to the Monte Carlo Analysis 

As we mentioned in the introduct ionr Monte Carlo 
analysis was used to simulate applications of DEFT in order 
to address each of the four basic questions (interpreta- 
tionr sensitivity, comparison of outputs, and stability). 

* Eight input variables were used in these analyses: 

• Performance Deficit (PD) 

• Difficulty (D) 

• Training Acquisition Efficiency (AE) 

• Residual Performance Deficit (RPD) 

• Residual Learning Difficulty (RLD) 

• Physical Similarity (PS) 

• Functional Similarity (FS) 

• Transfer Efficiency (TT) 

* Abbreviations are those used in report II. 

4 
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These variables are obtained in different ways for each of 
the three levels of DEFT. However, since these different 
methods all result in equivalent scales (e.g., "Performance 
Deficit" has a range of 0-100 for all three DEFT levels), 
it was decided to use these variables in the Monte Carlo 
analyses. 

Since the distribution of DEFT outputs (the basic 
product of each analysis) depends on the distribution of 
the inputs, selection of input distributions was key. 
Because DEFT is a new tool that has not been applied to the 
evaluation of a large number of training devices, no em- 
pirical distributions of inputs currently exist. 
Therefore, it was necessary to use artificial input dis- 
tributions. The analysts working on this task selected the 
uniform distribution (i.e., all input values have the same 
probability of being selected) as the standard for input to 
the Monte Carlo analyses. This represents an extremely 
conservative approach; it was selected to provide a "worst 
case" baseline for comparisons with other sets of 
assumptions. 

In addition to selecting a distributional form for in- 
put to the analyses, it was necessary to decide on the 
number of trials or simulated model applications for each 
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analysis. The selection criterion used for the number of 
trials was the degree of convergency of (1) a distribution 
of data points generated randomly from an underlying 
uniform distribution with (2) the theoretical uniform dis- 
tribution* Convergence was examined for numbers of trials 
ranging from 1,000 to 9,000. The number 5,000 was chosen, 
finally, because it is cost-effective for this application; 
convergence is almost as good for 5,000 trials as for 9/000 
trials, and substantially less computing power is required. 

Thus, each Monte Carlo analysis of DEFT output simu- 
lates 5,000 random applications of the DEFT model. This 
basic analysis was performed under a variety of conditions 
that depended upon the question to be answered. Tabular 
and, where appropriate, graphic presentations of results 
appear in the following sections. 

Interpretation of Output 

The objective of this first set of analyses was to ex- 
plore the distributional characteristics of the DEFT out- 
put. This was done under five different conditions, three 
using uniform distributions, and two using truncated normal 
distributions. The conditions were: 
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1) Uniform input distributions; denominator input 
variables (i.e., acquisition and transfer ef- 
ficiency measures [see Rose Rose & Wheaton, 1984b, 
Chapter 6]) range from one to 100; all others 
range from zero to 100. Inputs combined using 
initial DEFT model. 

2) Uniform input distributions; all input variables 
range from one to 100. Inputs combined using ini- 
tial DEFT model. 

3) Uniform input distributions; all inputs range from 
one to 100. Square root taken of denominator (ef- 
ficiency) variables (e.g., AE = ^R/lOO instead of 
AE = R/lOO; otherwise, combination identical to 
initial DEFT model. 

4) Input distributions truncated normal. Inputs com- 
bined using initial DEFT model. 

5) Input distributions truncated normal. Square root 
taken of efficiency variables. Otherwise, com- 
bination identical to initial DEFT model. 

Tables 1 through 3 summarize results for intermediate 
and output variables under Conditions 1, 2,' and 3. In 
these tables: 

7 
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VARIABLE 
TP 

ACQ(A) 

AD 

TRP 

TRANS(T) 
TOTAL(A+T) 



Table 1. CONDITION 1 RESULTS— UNIFORM INPUT; 
INITIAL RANGES AND COMBINATIONS 

DESCRIPTIVE STATISTICS FOR MODEL DEFT 
5000 TRIALS 

MEAN VARIANCE STD DEV MINIMUM MAXIMUM 



24.87 
131 .36 
16.76 
41 .71 
217.69 
349.04 



491 .21 
177317.58 

555.15 
1039.74 
390344.31 
572816.19 



. 1 o 
421 .09 
23.56 
32.25 
624.78 
756.85 



.00 
.00 
.00 
.00 
.00 
.00 



99.00 
8722.00 
99.00 
168.22 
1 1967.00 
12268.29 



Table 2. CONDITION 2 RESULTS — UNIFORM INPUT? 
ALL RANGES 1-100; INITIAL COMBINATION 



VARIABLE 
TP 

ACQ (A) 

AD 

TRP 

TRANS (T) 
TOTAL (A+T) 



DESCRIPTIVE STATISTICS FOR MODEL DEFT 
5000 TRIAL 5: 

MEAN VARIANCE STD DEV MINIMUM MAXIMUM 



25.12 
131 .75 
16.99 
42.59 
211 .42 
343.17 



486.31 
150900.78 

557.64 
1069.20 
398557.33 
55521 1 .20 



22.05 
388.46 
23.61 
32.70 
631 .31 
745.12 



.04 
.06 
.00 
.06 
.07 
1 .63 



100.00 
8700.00 
97.00 
188.00 
1 1450.00 
1 1466.21 



Table 3. CONDITION 2 RESULTS — UNIFORM INPUT? 
ALL RANGES 1-100? SQUARE ROOT TRANSFORMATION 



VARIABLE 
TP 

ACQ (A) 

AD 

TRP 

TRANS (T) 
TOTAL (A+T) 



DESCRIPTIVE STATISTICS FOR MODEL DEFT 
5000 TRIALS 

MEAN VARIANCE STD DEV MINIMUM MAXIMUM 



25.37 
47.23 
16.58 
42.04 
78.33 
125.56 



488.26 
3751 .51 

544.03 
1026.01 
8156.73 
11770.24 



61 .25 
23.32 
32.07 
90.31 
108.49 



.03 
.06 
.00 
.08 
.09 
.96 



99.00 
872.20 
98.00 

1201 ^60 
1284.90 
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TP = Training Problem 
(A) ACQ = Total Acquisition Score 
AD = Additional Deficit 
TRP = Transfer Problem 
(T) TRANS = Total Transfer Score 
(A+T) TOTAL = Total Score. 

The most striking features of these results are the 
high variances displayed in Conditions 1 and 2; the output . 
distributions are extremely diffuse given uniform input 
distributions. In Condition 3, the output distributions 
are substantially tighter because of the square root trans- 
formation in the denominators (the transformation makes the 
denominator largerr narrowing the range) r 

Since the obtained values for the variance of scores 
in the first two conditions would make the interpretation 
of DEFT output relatively meaningless, we decideil ^:o modify 
the assumption of uniform input distributions. Based on 
our familiarity with training devices in general, and with 
U.S. Army training devices in particular, we hypothesized 
distributions for each input parameter. The truncated nor- 
mal input distributions fcr Conditions 4 and 5 were the 
following: 
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VARIABLE 



MODE RANGE 



PD (Performance Deficit) 


70 


30- 


90 


D (Difficulty) 


55 


10- 


100 


AE (R) (Training Efficiency) 


65 


25- 


100 


RPD (Residual Performance 


30 


1- 


65 



Deficit) 

(RLD) RD (Residual Learning Difficulty) 50 10-90 

PS (Physical Similarity) 30 30-100 

FS (Functional Similarity) 70 45-100 

(TT) RR (Transfer Efficiency) 35 10-90 

These distributions were obtained by transforming a stan- 
dard normal distribution centered at zero and truncated at 
-•3 and +3. The mode of the standard normal distribution 
(always zero) was mapped to the mode of the targo^ range, and • 
the truncated value of -3 was mapped to the endpoinc furthest 
below the mode (e.g., for a mode of 70 and a range of 30-90, 
-3 was mapped to 30) ; finally, the target distribution was 
truncated appropriately at the other end of the range. 

Results for Conditions 4 and 5 are summarized in 
Tables 4 and 5. Variances are substantially lower for both 
of these conditions than for Conditions 1 through 3, be- 
cause of the changes in the assumptions about input 
distributions; and variance is lower for Condition 5 than 
Condition 4 on account of the square root transformation. 
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Table 4. CONDITION 4 RESULTS— TRUNCATED NORMAL 
INPUT; INITIAL DEFT COMBINATIONS 



*V 9^ T'm ▼ t' \ t 

DESCRIPTIVE 


STATISTICS FOR MODEL DEFT 
5000 TRIALS 


(RESTRICTED 


RANGES) 


VARTAEi-E 


MEAN 


VARIANCE 


STD DEV 


MINIMUM 


MAXIMUM 


PD 


68.12 


133.16 


1 1 .54 


30.32 


89.99 


D 


54.84 


223.19 


14.94 


1 1 .02 


99.6? 


R (AE) 


65.01 


171 .62 


13.10 


26.81 


100.00 


RPD 


30.23 


127.40 


i 1 .29 


1.10 


64.71 


RD (RLD) 


50.40 


177.50 


13.32 


10.32 


89.60 


PS 


76.14 


i e? . 78 


13.70 


30.40 


99.98 


FS 


70.17 


91 .65 


9.57 


15.1 2 


99.70 


RR (TT) 


O O . _/ o 


.:.0 / . 


1 -> * '^y 1 


10.01 


89.54 


TP 


37.33 


144.59 


12*02 


5.85 


81 .07 


ACQ (A) 


59.97 


561 .05 




8.10 


1 96 . 83 


AD 


10.23 


127.95 


1 1 .31 


.00 


52.86 


TRP 


25.48 


176.43 


13.28 


.83 


78.05 


TRAM.S (T) 


80. 19 


3951.16 


62.88 


1 .16 


61.1.71 


TOTAL (A+T) 


MO. 46 


4381 .49 


66.19 


Am Am A i A.. 


650.3^ 
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Table 5. CONDITION 5 RESULTS — TRUNCATED NORMAL 
INPUT; SQUARE ROOT TRANSFORMATION OF DENOMINATOR 



DESCRIPTIVE STATISTICS FOR MODEL DFFT (RESTRICTED RANGES) 

5000 TRIALS 



VARIABLE 


MEAN 


VARIANCE 


PD 


68.12 


133.16 


D 


51.81 


223.19 


R (AE) 


65.04 


171 .62 


RPD 


30.23 


127.40 


RD (RLD) 


50.40 


177.50 


or 


76.11 


187.78 


FS 


70 . i 7 


91 .65 


RR (TT) 


38.53 


239.35 


TP 


37.33 


144.59 


ACQ (A) 


17.04 


253.81 


AD 


10.23 


1 27:95 


TRP 


1>.5.48 


176.13 


TRANS (T) 


44.07 


674.29 


TOTAL (A+T) 


91 . 1 1 


896.99 



STD DEV 


MINIMUM 


MAXIMUM 








1 1 .54 


30.32 


89.99 


14.94 


1 1 .02 


99.62 


13.10 


26.84 


100.00 


1 1 .29 


1.10 


61.71 


13.32 


10.32 


8r.60 


13.70 


30.10 


99.98 


9.57 


45.12 


99.70 


15.17 


10.01 


39.51 


12.02 


5.85 


81 .07 


15.93 


6.88 


122.90 


1 1 .31 


.00 


52.86 


13.28 


.83 


78.05 


25.97 


1.10 


193.88 


29.95 


18.51 


233.76 



Dt^T our I rtvniLnuLt 
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ThuSf based on some reasonable assumptions regarding 
the distribution of expected input values, we see that DEFT 
outputs are interpretable and meaningful in both an ab- 
solute and a relative sense* For example, a device receiv- 
ing a Training Problem (TP) score of 65.0 could be inter- 
preted as addressing a "larger" problem than a typical 
device (mean = 37.33, s.d. = 12.02, Condition 4). 
Differences between racings for two devices on obtained 
scores could be interpreted with reference to expected 
scores. 

Sensitivity 

Eight sensitivity analyses were performed, one for 
each of the DEFT input parameters. The objective of these 
analyses was to explore the impact of changes in input pa- 
rameter values on the values of intermediate and output 
variables . 

The analyses were conducted usiTig Condition 3 of DEFT 
(as described above) — all input variables are assumed to be 
distributed uniformly between one and 100; training and 
transfer efficiency variables are subjected to square root 
transformations • 
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Table 6 shows DEFT results when all inputs vary 
freely; Tables 7 through 14 show how these results vary 
with systematic variation of each input parameter • 

As might have been expected, the efficiency variables 
have the largest effect on the means and standard devia- 
tions of the output scores* For example, across the range 
of input values, changing training efficiency scores 
produces variations in the Total Score mean from 334,0 to 
103.5, and changes the standard deviation from 140.0 to 
96.0. In general, varying each of the other inputs changes 
the Total Score by approximately 100 points and the stan- 
dard deviation by approximately 40 points. 

Another way of looking at these results is to say that 
all scales (except Efficiency) have equivalent effects on 
the Total Score — an extreme value on any single scale will 
have the same effect as an extreme value on any other. 
Hence, all scales are "weighted" equally. The logical (and 
analytic) exceptions are the efficiency scales: a device 
that incorporates poor training or transfer principles 
would be expected to have a larger effect on training time, 
expense, and effort than any single component, since poor 
techniques will affect all aspects of the training and/or 
transfer problem. 
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Table 6. DESCRIPTIVE STATISTICS FOR DEFT — 
FOR COMPARISON WITH SENSITIVITY ANALYSES 

DESCRIPTIVE STATISTICS FOR MODEL DEFT SENSITIVITY ANALYSIS 

5000 TRIALS 





NAME 


MEAN 


VARIANCE 


ST DEV 


MINIMUM 


MAXIMUM 




PD 


50.73 


822.50 


28.68 


1 .00 


100.00 




D 


50.09 


832.99 


28.86 


1 .00 


100.00 




R (AE) 


5i .04 


829,88 


28.81 


1 .00 


100.00 




RPD 


50.25 


829.97 


28.81 


1 .00 


100.00 




RD (RLD) 


50.53 


826 . 55 


28.75 


1 .00 


100.00 




PS 


50.17 


829.67 


28 . 80 


1 .00 


100.00 




FS 


50.58 


846.28 


29 . 09 


1 .00 


100.00 




RR (TT) 


50.45 


834.94 


28.90 


1 .00 


100.00 




TP 


25.53 


' 498.47 


22.33 


.03 


99.00 




ACQ (A) 


47.08 


3654.26 


60.45 


.03 


837.. 20 


r. 


AD 


16.79 


559.71 


23.66 


.00 


98.00 




TRP 


42.00 


1035.41 


32.18 


.02 


1 79 . 1 4 




TRANS (T) 


79.74 


9694.16 


98.46 


.03 


121 1 .60 




TOTAL (A+T) 


126.83 


13410.99 


1 15.81 


1.16 


' 1266.40 



ERIC 
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Table 7. SENSITIVITY ANALYSIS FOR PD 



SENSITIVITY ANALYSIS FOR PD 





TP 




ACQ 


(A) 


M 




TRP 


TRAHS (T) 


TOTAL (A+' 


l>D 


HEAM 


n TEV 


HEM Sr DEV 


MEAM 


ST DEV 


HEM 


ST DEV 


HEAH 


ST DEV 


HEAH 


5T prv 


1 


.5 


.3 


.9 


.9 


16.5 


23.7 


12.0 


32.4 


77.9 


93.6 


78.0 


93.6 


25 


12.7 


7.2 


23.6 


23.0 


16.5 


23.7 


42.0 


32.4 


77.9 


93.6 


101 .4 


96.1 


50 


25.1 . 


11.1 


47.2 


46.0 


16.5 


23.7 


42.0 


32*4 


77.9 


93.6 


125.0 


103.8 


75 


38.1 


31.6 


70.7 


69.0 


16.5 


23.7 


42.0 


32.4 


77.9 


93.6 


148.6 


115.6 


100 


50. B 


28. e 


91.3 


92.0 


16.5 


23.7 


42.0 


32.4 


77.9 


93.6 


172.2 


130.5 



Table 8. SENSITIVITY ANALYSIS FOR D 



SEMSITIVITV ftHALYSIS FOR D 

VrtRIfttiLE: TP ACQ (A) AD TRP TRAMS (T) TOTAL (A+T) 



D 


nr. AM 


n DEV 


HEAH 


ST DEV 


HEAH 


ST DEV 


HEAH 


ST DEV 


HEAft 


ST DEV 


MEAN 


ST W;y 




























1 


-.5 


.3 


.9 


.9 


16.5 


23.7 


42.0 


32.1 


77.9 


93.6 


78.0 


^3.6 


25 


12.6 


7.1 


23.3 


22.6 


16.5 


23.7 


12.0 


32.4 


77.9 


93.6. 


101 .2 


''6.1 


50 


25.2 


11.2 


16.6 


15.2 


16.5 


23.7 


12.0 


32.4 


77.9 


93.6 


121.5 


103.6 


75 


37.7 


21 .3 


70.0 


67.7 


16.5 


23.7 


42.0 


32.1 


77.9 


93.6 


147.8 


115.1 


100 


50.3 


28.1 


93.3 


90.3 


16.5 


23.7 


42.0 


32.1 


77.9 


93.6 


171 .1 


;2''.6 



BEST copy AVAILABLE 



Table 9. SENSITIVITY ANALY'.- IS FOR R (AE) 



SENSITIVITY ANALYSIS FOR R (AE) 



VARIABLE: 


TP 




ACQ 


(A) 


AD 




TRP 




TRANS (T) 


TOTAL (A+' 


R 


MEAN 


ST DEV 


MEAM 


SJ DEV 


HEAH 


SJ DEV 


MEAN 


SJ DEV 


MEAM 


SJ DEV 


MEAN 


SJ DfTV 


1 


25.6 


22*2 


256*2 


221 *9 


16*5 


23*7 


42*0 


32.4 


77*9 


93.6 


331.0 


210.0 


25 


25.6 


22*2 


51 *2 


11*4 


16*5 


23*7 


42*0 


32*4 


77*9 


93.6 


129.1 


103.2 


50 


25*6 


22*2 


36*2 


31 *1 


16*5 


23*7 


42*0 


32*4 


77*9 


93.6 


114.1 


98. 1 


75 


25*6 


22*2 


29*6 


25*6 


16.5 


23*7 


42.0 


32*4 


77.9 


93.6 


107.4 


96.8 


100 


25*6 


22*2 


25*6 


22*2 


16*5 


23*7 


42*0 


32*4 


77.9 


- 93.6 


103.5 


96.0 
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Table 10. SENSITIVITY ANALYSIS FOR RPD (PD') 



SEMSITIVTTY ANrtl.YStS FOR RPD 





TP 




ACQ 


(A) 


AD 




TRP 




iHms (T) 


TOTAL 


RF-0 




SJ DEV 




SI DEV 




.ST DEV 


MEAM 


SJ DEV 




SJ DEV 


mm 


SI Dry 


1 


25.6 




17.7 


61 .3 


16.5 


23.7 


17.0 


23.7 


31 .B 


60.9 


79 . 5 




25 


25.6 


22 . 2 


17.7 


61 .3 


16.5 


23.7 


29.2 


21.8 


51.3 


69.7 


102.0 


92.1 


50 


25.6 


22 n 


47.7 


61 .3 


16.5 


23.7 


11 .8 


27.8 


77.7 


84.3 


125.4 


103.8 


75 


25.6 


22.2 


17.7 


61 .3 


16.5 


23.7 


51.4 


32.2 


101 .1 


102.0 


lie. 8 


118.9 


100 


25.6 


nn n 

4.4. . 4. 


17.7 


61 .3 


16.5 


23.7 


67.1 


37.5 


121.5 


121 .6 


172.1 


13d. 2 



Table 11. SENSITIVITY ANALYSIS FOR RD (D ' ) (RLD) 



SEHSITIVITY AHrtLYSIS F0« RD (RLD) 



VARIABLE • 


TP 




ACQ 


(A) 


AD 




TRP 




TRAM^ (T) 


TOTAL (A+' 


RD 


MEAri 


SJ DEV 


MEAH 


SJ DEV 


MEAH 


SJ DEV 


MEAM 


SJ DEV 


MEAH 


.ST DEV 


MEAH 


SI Dry 


1 


25.6 


22.2 


47.7 


61 .3 


16.5 


23.7 


17.0 


23.7 


31 .Q 


60.? 


79.5 




?5 


25.6 


22 . 2 


17.7 


61 .3 


16.5 


23.7 


29.1 


21.7 


51.1 


71 .3 


102.0 


«^3.1 


50 


25.6 


22 . 2 


47.7 


61 .3 


16.5 


23.7 


41 .7 


27.6 


77.8 


87.1 


125.5 


\ . 2 


75 


25.6 


nn n 

<.4. . <. 


47.7 


61 .3 


16.5 


23.7 


51.3 


31 .9 


101 .2 


105.8 


110.9 


122.3 


100 


25.6 


22 . 2 


17.7 


61 .3 


16.5 


23.7 


66.9 


37.1 


121.7 


126.1 


172.1 


1 to 1 



BEST COPY AVAILABLE 



Table li-.'" SEtlS'lTlf^iTy 'analysis FOR PS 





TP 




ACQ 


(A) 


6t> 




TRP 




(T) 


TOTAL (A+T) 










.n DEV 


MEAH 


ST DEV 




;?T DEV 


ME^iH 


ST DEV 


MEAN 


SI DPV 


1 


15.6 • 


in 1 


17.7 


61 .3 


.0 


.0 


25.1 


22.2 


16. y 


59.1 


91.6 


9o.2 




23.6 




17.7 


61 .3 


3.1 


6.3 


28.5 


23.1 


52.8 


61.1 


100.5 


f3'-'.1 


50 


25.6 


22 . 2 


17.7 


61 .3 


12.1 


16.0 


37.8 


27.1 


70.1 


80.5 


1 19.1 


tOI .5 


75 


25.6 




17.7 


61 .3 


27.9 


21.8 


53.3 


33.3 


99.1 


101.6 


117.0 


121 .1 


100 


25.6 


22 . 2 


17.7 


61 .3 


19.6 


29.0 


75.1 


36.6 


110.2 


130.9 


187.8 


111.1 



Table 13. SENSITIVITY ANALYSIS FOR FS 



SEUSIJVJIU m^LfSlS FOR 



VARIABLE: 



TP 



ACQ (A) 



AD 



TRP 



TMUS (T) 



TOTAL 



(A+T) 



rs 


MEAM 


.^T DEV 


MCrtH 


ST DEV 


MEAM 


ST DEV 


MEAM 


ST DEV 


MEAH 


ST DEV 


MEAN 


ST nrv 




























1 


25.6 


22 . ? 


17.7 


61 .3 


48.5 


29.1 


71.0 


36.6 


137.6 


130.8 


185.3 


iri.o 


25 


25.6 


nn n 


17.7 


61 .3 


27.8 


21.9 


53.2 


33.1 


99. C 


105.3 


116.7 


1'?2.5 


50 


25.6 


22 . 2 


17.7 


61 .3 


12.1 


16.1 


37.8 


27.1 


70.1 


80.6 


117.0 


101 .8 


75 


25.6 


nn n 


17.7 


61 .3 


3.1 


6.5 


28.5 


23.1 


52.8 


63.8 


100.5 


09.2 


100 


25.6 




>57.7 


61 .3 


.0 


.0 


25.1 


22.2 


16.9 


59.1 


91.6 


e6.2 
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Table 14. SENSITIVITY ANALYSIS FOR RR (R') (TT) 



SENSITIVITY ANALYSIS FOR RR (TT) 



yAf>r ABLE : 


TF 




ACQ (A) 


AD 




TRP 




TRAUr (T) 


TOTAL (A+T) 


RR 


HEAH 


ST DEV 


MEAH 


DEV 


MEAM 


ST DEV 


MEAM 


SI DEV 


MEAM 


DEV 


MEAM 


TT Dtrv 


1 


2T.6 


22 . 2 


^7.7 


61 .3 


16.5 


23.7 


42.0 


32.4 


419.5 


324.3 


167.2 


3ne.5 


25 


25.6 


22 . 7 


i7.7 


61 .3 


16.5 


23.7 


42.0 


32.4 


83.9 


64.9 


131 .6 


G8.1 


SO 


25. c 


on n 


47.7 


61 .3 


16.5 


23.7 


12.0 


32.4 


59.3 


45.9 


107.0 


75.6 


75 


25.6 


22 • 2 


^7.7 


61 .3 


16.5 


23.7 


42.0 


32.4 


48.1 


37.1 


96.1 


71 .0 


100 


25,6 


2r* • 2 


47.7 


61 .3 


16.5 


23.7 


42.0 


32.4 


42.0 


32.4 


89.6 


6G.6 
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Comparison of Outputs 

The objective of the "comparison of outputs" analysis 
is to determine the probability of any given level of dif- 
ference between two DEFT TOTAL scores. To this end, two 
DEFT TOTAL output vectors (5000 data points each) were 
generated, and one was subtracted from the other to obtain 
a frequency distribution of differences. Table 15 sum- 
marizes the three distributions. 

It should be noted that the two TOTAL distributions 
were generated using Condition 3 above, which assumes 
uniformly distributed inputs; as was noted before, this is 
an extremely conservative assumption. 

Figure 1 shows a frequencv distribution of the dif- 
ferences; as is to be expected, the differences are dis- 
tributed approximately normally with a mean very close to 
zero. 

Table 16 summarizes the probabil i ty distribution based 
on this analysis. This table can be used to determine 
statistical significance, although it is extremely conser- 
vative due to the underlying distributional assumptions. 
According to this table, two devices would need to differ 
by approximately 150 points in the Total Score to be judged 
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Table 15. DESCRIPTIVE STATISTICS FOR DEFT CONDITION 3 

DIFFERENCE ANALYSIS 



DESCRIPTIVE STATISTICS FOR MODEL DEFT DIFFERENCE ANALYSIS 

5000 TRIALS 

MAHE HFAN VARIANCE ST DEV MINIMUM MAXIMUM 

TOTAl.1 126.52 12869.87 113.45 .87 1335.32 

T0TAL2 126.51 12320.60 111.00 1.56 1163.96 

DIFFER .01 24404.78 156.22 "1118.48 1222.42 



ERIC 
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BEST COPY AVAILABLE 



rREPiiuicY DirmiPUTioM of difperemce for model deft 

5900 TRIALS 



HP m n P 



0«- 
•300 



n nnp 



•film Nf •dm Nil 



MOO "300 



"200 



"100 0 100 200 300 400 

VALUE OF VARIAPLE 



HOIF F.ACII * PCrRFrFllir APOUT 5 DAIA POtHT(r) 



5ce 



Figure 1. FREQUENClc DISTRIBUTION OF DEFT CONDITION 3 DIFFERENCES 



ERLC 



40 



Table 



16. PROBABILITY DISTRIBUTION OF DEFT 
CONDITION 3 DIFFERENCES 



SIGNIFICANCE TABLE FOR MODEL DEFT DIFFERENCE ANALY5I5 

5000 TRIALS 



DIFF CUM Z 

1110 .0000 

1130 .0000 

1120 .0000 

1110 .0002 

1100 .0002 

1090 .0002 

1080 .0002 

1070 .0002 

1060 .0002 

1050 .0002 

1010 .0002 

1030 .OOO^t 

1020 .0006 

1010 .0008 

1000 .0008 

"990 .oooa 

"980 .0008 

"970 .0008 

"960 .0008 

"950 .0008 

"910 .0010 

"930 .0010 

"920 .0010 

"910 .0010 

"900 .0012 

"890 .0014 

"880 .0011 

"870 .0016 

"860 .0016 

"850 .0016 

"810 .0016 

"830 .0016 

"820 .0016 

"810 .0016 

"800 .0018 

"790 .0020 

"780 .0020 

"770 .0020 

"760 .0022 

"750 .0026 

"710 .0026 

"730 .0028 

"720 .00*52 

"710 .0032 

"700 .0031 

"6?0 .0038 

^6n0 .0010 

"670 .00^12 



OIFF 




OO V 




"650 


.0044 




AAA /I 


oov 


. 0046 


"620 


.0016 


"6 1 0 


.0048 


"600 


.0052 


"590 


.0054 


"580 


*0054 


"570 


. 0062 


"560 


. 0068 


"550 


.0068 


"540 


.0070 


"530 


.0074 


"520 


. 0080 


"510 


.0086 


"500 


.0090 


"490 


.0094 


"480 


. 01 00 


^ f © 


.01 06 




Alia 




.0118 


'I'l V 


.VI tmtS 






"420 


. 01 34 


"4 1 0 


.0136 


"400 


.01 40 


"390 


. 01 50 


~TO A 

OoO 


. 01 54 


"370 


.01 64 


"360 


.01 76 


"350 


. 0182 


"310 


.0192 


"330 


.0198 


"320 


.0218 


"310 


.0236 


"300 


.0250 


•"290 


.0276 


"280 


.0296 


"270 


.0322 


"260 


.0361 


"250 


.0392 


"210 


.0136 


"230 


.0^30 


"220 


.0520 


"210 


.0582 


"200 


.0626 


"190 


.0684 



DIFF CUM X 

"180 .0752 

"170 .0834 

"160 .0932 

"150 .1024 

"140 .1144 

"130 .1280 

"120 .1422 

"110 .1602 

"100 .1792 

"90 . 1 968 

"80 .2212 

"70 .2512 

"60 .2764 

"50 .3132 

"40 .3450 

"30 . 3808 

"20 .4180 

"10 .4582 

0 .4970 

1 0 . 5384 

20 .5790 

30 .6186 

40 .6562 

50 . 6886 

60 .7184 

70 .7460 

80 .7756 

90 . 8004 

100 .8226 

110 .8448 

120 .8608 

130 .8752 

140 .8896 

150 .8990 

160 .9084 

170 .9156 

180 .9216 

190 .9312 

200 .9380 

210 .9434 

220 .94/1 

230 .9546 

210 .9586 

250 .9624 

260 .9652 

270 .9682 

280 .9702 

290 .9722 



DIFF CUM X 

300 .9731 

310 .9750 

320 .9766 

330 .9784 

340 .9791 

350 .9806 

360 .9818 

370 .9828 

380 .9834 

390 .9840 

400 .9844 

410 .9848 

420 .9856 

430 .9876 

410 .9882 

450 .9886 

460 .9888 

470 .9894 

480 .9896 

490 .9904 

500 .9910 

510 .9914 

520 .9920 

530 .9920 

540 .9922 

550 .9926 

560 .9926 

570 .9932 

580 .9936 

590 .9936 

600 .9940 

610 .9942 

620 .9944 

630 .9948 

640 .9951 

650 .9956 

660 .9958 

670 .9958 

680 .9962 

690 .9962 

700 .9968 

710 .9972 

720 .9976 

730 .9980 

710 .9980 

750 .9984 

760 .9981 

770 .9986 



DIFF 



CUM X 


780 


.9986 


790 


.9986 


800 


.9986 


810 


.9988 


820 


.9990 


830 


.9990 


810 


.9990 


850 


.9990 


860 


.9990 


870 


.9990 


880 


.9990 


890 


.9990 


900 


.9992 


910 


.9992 


920 


.9992 


930 


.9994 


940 


.9996 


950 


.9996 


960 


.9996 


970 


.9996 
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as "different" at the 0.10 probability level. Much more 
realistic is a difference based on the restricted ranges 
generated in Conditions 4 and 5, described earlier. In 
these cases, for example, a difference of 30 points in the 
Total Score (Condition 5) would make two devices a standard 
deviation apart. 
Stability Analyses 

The purpose of the stability analyses was to examine 
the impact of deviations from perfect reliability. It is 
normally assumed that a rather high degree of stability is 
necessary to demonstrate the validity of the measuring in- 
strument and/or the robustness of the effect being 
measured. Establishing the existence of the desired degree 
of stability is an empirical endeavor (e.g., through 
repeated observations of raters); nonetheless, Monte Carlo 
analyses can be used to hypothetically examine the poten- 
tial impact of instability. 

Two kinds of Monte Carlo analyses were performed. The 
scale bias analysis shows the impact of preferences for 
certain portions of the input scale. The two-judge random 
error analysis examines the effect of measurement error on 
apparent stability. Results of this analysis can be used 
for null hypothesis testing. 
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Impact of scale bias. Table 17 summarizes the results of 
the scale bias analysis, which investigates the impact of a 
rater's preference for any specific portion of the allow- 
able 1-100 scale. Inputs are assumed to be uniformly dis- 
tributed; each row in Table 17 represents a different range 
from which the values for all input variables are drawn. 
The first row, provided for comparison, shows intermediate 
and output variable results for the unbiased case, in which 
the entire 1-100 range is used. Subsequent rows show 
results for cases in which simulated judgments (i.e., input 
values) are confined to smaller portions of the scale. 

Two- judge random error analysis • As has already been men- 
tioned, Monte Carlo analysis cannot be used to determine 
the degree of stability; this is an empirical question. 
However, investigation can be made of the impact of 
measurement error on apparent stability. In particular, 
suppose that two judges are i:^ agreement about all aspects 
of a device, but, due to measurement error, their ratings 
do not coincide perfectly. How does this affect their ap- 
parent agreement? 

To investigate this question, five sets of simulated 
DEFT model output were generated. The first set represents 
the "truth" in the form of 5,000 random applications of 
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Table 17. SCALE BIAS ANALYSIS FOR DEFT 
(UNIFORM INPUT DISTRIBUTIONS) 
SCfME BIAS Mf\USlS FOR MODEL DEFT 
5000 TRIALS 



-0 



VARIABLE: 


TP 




ACQ 


(A) 


AD 




TRP 




TRANS (T) 


TOTAL (A+' 


SCALE 


MErtN 


7T DEV 


MEAN 


ST DEV 


MEAN 


ST DEV 


MEAN ST 


DEV 


MEAN 


ST DEV 


ME<^N 


ST DEV 


1-100 


25.3 


22.3 


46.2 


58.2 


16.5 


23 . 2 


42.2 


31 .7 


77. 1 


88.3 


123.3 


105.4 


1" 20 


1 . 1 


.9 


4.0 


3.9 


3.2 


4.5 


4.3 


4.5 


15.7 


19.6 


19.7 


20.0 


20- 40 


9.0 


2.5 


16.6 


4.9 


3.3 


4.7 


12.4 


5.3 


22.9 


10.1 


39.5 


1 1 .2 


40- 60 


25.0 


4.1 


35.5 


6.2 


3.3 


4.7 


28.4 


6.2 


40.3 


9.1 


75.(3 


1 1 .0 


60- 80 


49.0 


5.8 


58.6 


7.4 


3.3 


4.7 


52.4 


7.4 


62.8 


9.2 


121 .4 


11 .7 


80-100 


80.9 


7.4 


85.4 


8.3 


3.3 


4.7 


84.4 


8.7 


89.1 


9.6 


174.5 


12.7 


20- BO 


24.9 


12.8 


37.0 


20.7 


10.0 


14.1 


35.1 


18.7 


52.4 


30.6 


89.4 


36.8 


1- 25 


1 .7 


1 .4 


5.6 


5.6 


4.0 


5.6 


5.7 


5.8 


19.1 


23.4 


24.7 


24.0 


25- 50 


14.0 


3.9 


23. 2 


6.9 


4.2 


5.9 


18.3 


7.0 


30.3 


12.1 


53.5 


13.8 


50- 75 


39.0 


6.5 


49.6 


8.7 


4.2 


5.9 


43.3 


8.6 


55.0 


1 1 .5 


104.6 


14.^ 


f — 1 00 


f o . D 


7 . O 


D 1 . 7 






J . T 


80 . 8 


10.7 


86.6 


12.0 


168.6 


15.7 


25- 75 


24.9 


10.5 


36.4 


16.6 


8.3 


11 .7 


33.4 


15.6 


49.0 


24.5 


85.3 


29.5 


1" 33 


2.9 


2.4 


8.5 


9.0 


5.3 


7.5 


8.2 


7.8 


24.6 


29.4 


33.1 


30.7 


33- 67 


24.9 


7.1 


35.8 


10.8 


5.7 


8.0 


30.7 


10.5 


44.1 


16.0 


79.9 


19.2 


67-100 


69.6 


1 1 .4 


76.5 


13.3 


5.5 


7.7 


75.3 


13.6 


82.8 


15.8 


159.4 


20.6 


1- 50 


6.5 


5.6 


16.0 


18.1 


8.2 


11 .5 


14.7 


12.6 


36.7 


42.4 


52.6 


46.0 


50-100 


56.1 


15.6 


65.7 


19.5 


B.'S 


11.7 


64,7 


19.3 


75.8 


24.0 


141.. 5 


30. C 
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DEFT in which two judges in fact agree perfectly on each 
and every input value. Table 18 summarizes this set of 
output (generated under Condition 3). The other four sets 
of DEFT output represent various kinds of "imperfection" in 
the form of deviation about the "truth" values. Tables 19 
through 22 summarize DEFT results for hypothetical judges 
whose ratings (input values) vary randomly about the "true" 
value • 

In Tables 19 and 20, the random variation is uniform 
over the interval true value +5 (interval width 10); in 
Tables 21 and 22, the variation is uniform over the inter- 
val true value +10 (interval width" 20) • 

Table 23 summarizes distributions of difference in 
DEFT TOTAL among the various data sets. The first row 
(DIFIOJIX) describes the variation of hypothetical Judge 
I's DEFT TOTAL about "truth's" DEFT TOTAL when Judge 1 is 
assumed to be reliable to +5; the second row (DIF10J2X) 
summarizes the same variation for hypothetical Judge 2, 
The third row (DIF10J1J2) summarizes the distribution of 
differences between Judge 1 and Judge 2's DEFT TOTALS when 
the two judges are assumed to be in perfect agreement, and 
each is reliable to +5. The fourth through sixth rows 
repeat the first through third rows for hypothetical judges 
that are reliable to +10 (interval width 20), 
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Table 18. HYPOTHETICAL "TRUE" RESULTS FOR DEFT 



DESCRIPTIVE STATISTICS FOR MODEL DEFT INTER-RATER ANALYSIS — TRUE VALUE 

5GGG TRIALS 



NAME 


MEAN 


VARIANCE 


ST DEV 


MINIMUM 


MAXIMUM 


PD 




841 .17 


29.00 


1 .00 


100.00 


i> 


50.25 


817.00 


28.58 


1 .00 


100.00 


R (AE) 


50.78 


818.28 


28.61 


1 .00 


100.00 


RPD 


51 .64 


832.52 


28.85 


1 .00 


100.00 


RD (RLD) 


^9.77 


832.99 


28.86 


1 .00 


100.00 


PS 


50.54 


821 .20 


28.66 


1 .00 


100.00 


FS 


50. 2<^ 


821 .77 


28.67 


1 .00 


100.00 


RR (TT) 


50.80 


826.48 


28.75 


1 .00 


100.00 


TP 


25.84 


498.07 


22 A 32 


.03 


98.00 


ACQ (A) 


48.23 


3952.76 


62.87 


.04 


809.90 


AD 


16.75 


558.34 


23.63 


.00 


98.00 


TRP 


42.50 


1043.59 


32.30 


.06 


175.49 


TRANS (T) 


78.00 


8163.48 


90.35 


.06 


1 187.50 


TOTAL (A+T) 


126.23 


12143.31 


1 10.20 


.98 


1240.11 
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Table 19. RESULTS FOR HYPOTHETICAL JUDGE 1 — 
DEVIATION OF + 5 FROM "TRUTH" 



DESCRIPTIVE ;?TATISTIC;? FOR MODEL DEFT INTER-RATER 



50G0 


TRIALS' ANALYSIS' ~ JUDGE 


t1 (INT 


UIDTH 10) 




NAME 




VARIANCE 


ST DEV 


MINIMUM 


MAXIMUM 


PD 














838. 02 


28.95 


1 .00 


100.00 


D 




81 1 .60 


28.49 


1 .00 


100.00 


R (AE) 




809. 99 


28.46 


1 .00 


1 00.00 


RPD 


-> 1 * f 1 


nor" A r 

825.46 


28.73 


1 .00 


1 00.00 


RD (RLD) 


AO OA 


822.94 


28.69 


1 .00 


100.00 


PS 


D\) A Ol 


81 5. 46 


28.56 


1 .00 


100.00 


F.S' 


z>y * 6\) 


81 5. 63 


28.56 


1 .00 


100.00 


RR (TT) 


50*93 


822.00 


28.67 


1 .00 


100.00 


TP 


25.93 


496.69 


.* 4^ . ^ 7 


.03 


99.00 


ACQ (A) 


46.83 


31 61 .89 


56.23 


.03 


725.40 


AD 




552.63 


23.51 


.00 


97.00 


TRP 


42*50 


1035.41 


32.1 8 


.03 


177.04 


TRANS (T) 


76. 27 


6686.59 


81 .77 


.07 


9^9.00 


TOTAL (A+T) 


123. 10 


9796.67 


98.98 


1 .55 


1088.63 
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Table 20. RESULTS FOR HYPOTHETICAL JUDGE 2 — 
DEVIATION OF + 5 FROM "TRUTH" 

DESCRIPTIVE STATISTICS FOR MODEL DEFT INTER-RATER ANALYSIS — 
JUDGE t2 (INT WIDTH 10) 5000 TRIALS 



NAME 


MEAN 


VARIANCE 


ST DEV 


MINIMUM 


MAXIMUM 


PD 


51 .31 


832. M 


28.85 


i .00 


100.00 


D 


50.29 


8i2.63 


28.51 


1 .00 


100.00 


R (AE) 


50.80 


813.37 


28.52 


1 .00 


100.00 


RPD 


51 .64 


828.28 


28.78 


1 .00 


100.00 


RD (RLD) 


49.86 


829.39 


28.80 


1 .00 


100.00 


PS 


50.68 


814.12 


28.53 


1 .00 


100.00 


FS 


50.35 


816.74 


28.58 


1 .00 


100.00 


RR (TT) 


50.87 


823.44 


28.70 


1 .00 


100.00 


TP 


25.90 


494.84 


22.24 


.06 


100.00 


ACQ (A) 


'1 7. 27 


3221 .30 


56.76 


.08 


601 .60 


AD 


16.70 


557.85 


23.62 


.00 


97.00 


TRP 


42.50 


1039.92 


32 4 2^ 


.07 


175.65 


TRANS (T) 


76 . 53 


68^,8.36 


82.75 


.07 


1052.80 


TOTAL (A+T) 


1 23 . 80 


10010.21 


100.05 


1 .40 


1 107.82 
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Table 21. RESULTS FOR HYPOTHETICAL JUDGE 1— 
DEVIATION OF + 10 FROM "TRUTH" 



DESCRIPTIVE STATISTICS FOR MODEL DEFT IMTER-RATER ANALYSIS 
JUDGE t1 (IMT WIDTH 20) 5000 TRIALS 



NAME 


MEAN 


VARIANCE 


ST DEV 


MINI 


MUM 


MAXIMUM 


PD 


5i 


.33 


825. 


24 


28. 


73 


1 


.00 


1 00. 


00 


D 


50 


. 1 7 


793. 


37 


28. 


i 7 


1 


.00 


1 00. 


00 


R (AE) 


50 


88 




4 1 

T 1 


'^8 


4G 


1 


.00 


1 00 

1 W A 


00 


RPD 


mi i 


77 

All 


O «^ V A 


94 


«WD A 




1 


.00 


i V V A 


00 


RD (RLD) 


49 




W 1 U A 


94 


28 




1 


.00 


1 00 A 
1 V V A 


00 


PS 


50 


.70 


808. 


54 


28. 


43 


1 


.00 


1 00. 


00 


FS 


50 


.37 


807. 


97 


28. 


42 


1 


.00 


100. 


00 


RR (TT) 


50 


.84 


805. 


47 


28. 


38 


1 


.00 


100. 


00 


TP 


25 


.81 


479. 


^7 


21 . 


90 




.01 


98. 


01 


f>CGJ (A) 


46 


.15 


2764. 


10 


52. 


57 




.02 


558. 


00 


AD 


16 


.63 


551 . 


34 


A^. « J A 


48 




.00 


97. 


00 


TRP 


^2 


.49 


1027. 


32 


W Am A 


05 




.02 


187. 


05 


TRANS (T) 


74 


.97 


6282. 


88 


79. 


26 




.03 


985. 


60 


TOTAL(A+T) 


121 


.12 


9132. 


09 


95. 


56 


1 


.65 


988. 


53 
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Table 22. RESULTS FOR HYPOTHETICAL JUDGE 2 — 
DEVIATION OF + 10 FROM "TRUTH" 

DESCRIPTIVE STATI.STTCS FOR MODEL DEFT INTER-RATER ANALY.SIS - 
JUDGE t2 (INT WIDTH 20) 5000 TRIALS 



MAMC 

ru-iPit 

r u 


ntfiN 
^ 1 A oo 


VARIANCE 


ST DEV 

O O "7 X 

28 . 71 


MINIMUM 
1 . 00 


MAXIMUM 
100.00 


n 
u 






o o o o 

28 . 29 


1 . 00 


1 00.00 








O O T < 

28 .31 


1 . 00 


1 00.00 


V\* it 


Dl A f ^ 




O O Z T 

28 . 63 


1 . 00 


1 00.00 


c.n (RLD) 


0\J a\)\) 


O AO T"? 


28.45 


1 .00 


1 00.00 


r o 




O A Z CO 

806 . 58 


28 .40 


1 .00 


100.00 


FS 


50.32 


802.96 


28.34 


1 .00 


100.00 


RR (TT) 


50.88 


806.74 


28.40 


1 .00 


100.00 


TP 


25.95 


491 .39 


22.17 


.06 


99.00 


ACQ (A) 


46.25 


2778.98 


52.72 


.08 


631 .45 


AD 


16.62 


541 .59 


23 . 27 


.00 


95.00 


TRP 


42.51 


1009.59 


31 .77 


.06 


185.20 


TRANS (T) 


75.06 


5967.60 


77.25 


.06 


901 .20 


TOTAL (A+T) 


121 .31 


8693.74 


93.24 


1 .26 


907.58 
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Table 23. DISTRIBUTIONS OF DEFT TOTAL DIFFERENCES 



DESCRIPTIVE STATISTICS FOR MODEL DEFT INTER-RATER DIFFERENCES 

5000 TRIALS 



NAME 


MEAN 


VARIANCE 


ST DEV 


MINIMUM 


MAXIMUM 


DIF10J1X 


"3.13 


1689.1 1 


41.10 


-646.40 


533.59 


DT.F10J2X 


"2.43 


1620.27 


40.25 


-623.61 


458.34 


DIF10J1J2 


-.70 


1633.79 


40.42 


-551 .94 


581 .99 


I)IF20J1X 


-5.11 


3529.54 


59.41 


-755.65 


586.10 


DIF20J2X 


-4.92 


3024.65 


55.00 


-818.98 


644.14 


DJ.F20J1J2 


-.19 


3144.1 1 


56.07 


-641 .92 


703.01 
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The utility of this analysis is in its potential for 
null hypothesis testing. Given two (real) judges rating 
the same device, and a difference between their DEFT TOTAL 
scores, we can determine the likelihood of a difference of 
that magnitude or larger given stability of +5 or +10 and 
an assumption of no underlying disagreement. Since the 
differences appear to be distributed normally (see Figures 
2 through 7), this test can be made using the standard nor- 
mal distribution. Output of this analysis can also be used 
to determine confidence intervals or credible intervals 
about the DEFT TOTAL computed from one (real) judge's input 
ratings, assuming stability of +5 or +10. 
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Figure 3. DISTRIBUTION OF DEFT DIFFERENCES FOR HYPOTHETICAL 
JUDGE 2 (RELIABLE TO -f 5) VERSUS "TRUTH" 
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Figure 5. DISTRIBUTION OF DEFT TOTAL DIFFERENCES FOR 
HYPOTHETICAL JUDGE 1 (RELIABLE TO ± 10) VERSUS "TRUTH" 
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Figure 6, DISTRIBUTION OF DEFT TOTAL DIFFERENCES FOR 
HYPOTHETICAL JUDGE 2 (RELIABLE TO ± 10) VERSUS "TRUTH" 
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Figure 7. DISTRIBUTION OF DEFT TOTAL DIFFERENCES FOR 
HYPOTHETICAL JUDGE 1 VERSUS HYPOTHETICAL JUDGE 2 
(PERFECT AGREEMENT; BOTH RELIABLE TO + 10) 
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3. Interrater Agreement 

The purpose of this exercise was to determine the de- 
gree of interrater agreement that could be achieved using 
DEFT* This exercise also served as a "dry run" through the 
DEFT procedures — in essence, a "feasibility" study. Could 
DEFT be used by various types of raters with more or less 
familiarity with the selected training devices and more or 
less familiarity with DEFT? 

The method chosen was to have six raters use DEFT to 
evaluate three training devices. Two of the training 
devices were designed to train the same tasks and subtasks — 
thus, we had a "comparative" evaluation. The third train- 
ing device was designed to train several different tasks. 
We selected two of these tasks. We chose this method — 
i.e., a limited set of training devices and a limited set 
of raters — rather than alternative approaches (e.g., many 
raters-one training device, few raters^-many training 
devices, many raters-many training devices) primarily be- 
cause of time and resource constraints. However, we also 
viewed this method as a"worst-case" test: if we could not 
demonstrate agreement in this situation, we would not be 
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able to demonstrate agreement in less controlled 
situations. Our method also constrained the use of 
sophisticated statistical evaluations. For example, cor- 
relations between raters over repeated measures on the same 
rating scale could not be meaningfully interpreted due to 
the small number of observations. Nonetheless^ descriptive 
statistics^ such as mean differences across raters^ could 
provide sufficient information to determine the feasibility 
and usefulness of DEFT. 

Method 

Devices and Tasks/Subtasks. Two armor gunnery train-« 
ing devices were selected: The MK-60 Gunnery Trainer 
(VIGS) ^ and the burst-on-target (BOT) trainer. These two 
devices were examined in the context of training a single 
gunnery engagement, shown in Figure 8 (from Harris, Ford, 
Tufano, & Wiggs, 1983). The third device selected was a 
maintenance procedures simulator. This was selected be- 
cause AIR staff were intimately involved in its design, ex- 
tensive materialb were available, and the tasks selected 
for evaluation were similar to maintenance procedures con- 
tained in U.S. Army tasks. B. i ef descriptions of the three 
devices and the tasks and subtasks evaluated follow. 
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IDOC JOB OBJECTIVE 56 
PLUS BOT 



Precision, periscope, stationary firing tank, moving tank target 
(1200-1600) meters), SABOT, direct fire adjustment (BOT) 



GUNNER BEHAVIORAL ELEMENTS 



1. Gunner indexes ammunition. 

2. Gunner turns on main gun switch. 

3. Gunner announces IDENTIFIED. 

4. Gunner applies lead in direction of target apparent 
motion. 

5. Gunner lays crosshair leadline at center of target 
vulnerability. 

6. Gunner makes final precise lay. 

7. Gunner announces ON THE HAY. 

8. Gunner fires main gun. 

9. Gunner announces sensing and BOT. 

10. Gunner relays (BOT). 

11. Gunner announces ON THE w;*' (BOT). 

12. Gunner fires main gun (BOT). 



The gunnery engagement and gunner behaviors come from two sources. 



1. Boldovici, J. A. (HumRRO), Boycan, G.G. (ARI), Fingerman, P.F., & 
Wheaton, G.R. (AIR). M601A0S Tank Gunnery Data Handbook , ARI 
Technical Report TR-79-A7, March 1979. 

2. U.S. Army, FM17-12, Tank Gunnery , March 1977. 



FIGURE 8. 



GUNNERY ENGAGEMENT 
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MK-60 Gunnery Trainer (VIGS) • The MK-SO is a portable 
electronic training device designed to provide soldiers 
with realistic and effective target engagement skills 
training for both novice and experienced gunners. The 
trainer combines video disc and microcomputer technologies. 
The gunner sees targets through an optical system that com- 
bines the image vjith a projected reticle, tracks the target 
and fires* Computer-generated graphics display hits, miss- 
es, and tracers in the gunners' s sight-picture and an ex- 
ternal monitor. Performance is assessed by the microcom- 
puter and feedback given to the gunner on a black and white 
scorecard CRT at the gunner's console. 

Burst-on^Target-Trainer (BOT) . The BOT uses simulated 
tank turret controls and a gunner's sight for operation. 
The trainer provides a target and reticle which move with 
respect to each other when the controls are actuated. The 
targets are various 35mm color slide transparencies of 
enemy tanks or vehicles and projected into the sight pic- 
ture by a standard carousel projector mounted within the 
trainer. When the gunner is on target and fires the main 
fun, a flash of laser light simulates the round burst. The 
instructor can view the scene and has independent control 
of the burst position to simulate various trajectories. 
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The gunnery engagement selected (Figure 8) was selec- 
ted for several reasons. Firsts AIR staff were familiar 
with it; secondr excellent documentation was availabler and 
third r this engagement had previously been processed 
through earlier versions of the TRAINVICE models (see 
Harris et al. r 1983) . 

The MPS Trainer* Materials drawn from AIR/Bedford 
files for the period 1974-1983 were extracted and edited to 
describe the E-3A Navigation Computer System (NCS) and the 
Maintenance Procedure Simulator (MPS) for that system. The 
MPS was built by Honeywell to E-3A design specifications 
developed by AIR/Bedford. 

The MPS was designed and acquired to support training 
in organizational (flightline) maintenance procedures for 
the AN/ASN-118 NCS installed on the E-3A aircraft. The NCS 
supplies navigation data to the aircraft flight control 
systemr the flight crew^ and the radar data processing 
group. The NCS incorporates a pair of redundant 
CAROUSEL-type inertial navigation units ^ a single doppler 
system to measure altitude, and an Omega VLF receiver/com- 
puter system to measure aircraft position. Organizational 
maintenance of the NCS relies primarily upon automatic 
fault detection and isolation performed by built-in test 
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equipment (BITE). Isolated faults are corrected by removal 
and replacement of line-replaceable units (LRUs) or 
substitutions of faulty soldered components (inductors, 
capacitors, filters) . 

The MPS is a computer-controlled trainer housed in a 
single integrated console. Operation of the E-3A aircraft 
AN/ASN-118 Navigation Computer System (NCS) is simulated 
only to the extent required for performance of the required 
organization-level maintenance procedures for the NCS, 
Faults in the NCS are simulated through the action of com- 
puter software. Required maintenance actions such as 
removal and replacement, connect and disconnect, and in- 
spection are simulated by the use of MPS controls rather 
than actual operations. 

During a normal training situation, the student 
operates controls of simulated aircraft and support equip- 
ment contained in the MPS. The computer software repeti- 
tively samples MPS control settings and causes the ap- 
propriate response to be displayed. Software response to 
the instructor/student actions can cause one or more of the 
following to occur: 
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1) change to one or more indicator displays 

2) removal or change of 35-mm slide displays 

3) Teleprinter message 

The MPS provides 273 training exercises that are used 
to train entering E-3A maintenance technicians on the NCS 
system-specific operations and maintenance procedures. 
Students entering the training course have completed basic 
training and a general navigation course which leads to the 
awarr* of semi-skilled (3-level) rating in AFSC 328X4. Upon 
graduation, students proceed to the E-3A Wing at Tinker 
AFB, where they begin work on the flightline. They are un- 
der supervision and receive additional on-the-job training. 

Table 24 describes two "tasks" which are, in reality, 
two parts of one of the 273 exercises. The tasks selected 
for description are: (1) Checkout of the Inertial 
Navigation System (INS), and (2) Fault isolation of Fault 
10 (of 100). Two information packages were prepared. The 
first set represented each task as performed in conjunction 
with the operational equipment. The second set represented 
the same tasks as performed in conjunction with MPS. Both 
provide data formulated for direct entry into the 
computerized DEFT program. The data included descriptive 
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Table 24. MPS and E-3A Tasks and Subtasks 



Task 1: Checkout of Inertial Navigation System (INS) 

Subtask Number Subtask Description 

10 Ensure E-3A aircraft power and cooling is 

available 

20 Turn NCS Power on 

30 Turn Autopilot off 

40 Turn (2) probe heaters off 

50 Synchronize (2) Horizontal Situation 

Indicators (HSI ) 

60 Set INS-1 and INS-2 to align mode 

70 Test CDU displays and lamps 

80 Detect Fault 10. (Performance index does 

not decrease from 9 to 5) 

Task 2: Fault Isolation of Fault 10 

81 Interchange CDU-1 and CDU-2 (Simulated on 

MPS) 

82 Perform Checkout (Task 1: 10-80) 

83 Interchange INU-1 and INU-2 (Simulated on 

MPS) 

84 Perform Checkout (Task 1: 10-80) 

85 Check 115 VAC Power 

86 Check wiring continuity (resistance) 

87 Replace shorted capcititor (Simulated on 

MSP) 

88 Perform Checkout (Task 1: 10-80) 
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text for each subtask and the controls, displays, skills 
and knowledge associated with the subtask* Task 1 was 
detailed only to the level required to link Task 1 and 
Task 2. The details of the subtasks were greatly ab- 
breviated to reduce or eliminate redundancy of activities 
which are required by the actual procedures, both for the 
operational equipment and for the trainer. Photographs and 
accompanying text were provided to indicate location of 
equipment; a listing of the associated displays and con- 
trols was also provided. 

Raters. Six AIR staff members participated in this 
study. These raters had differing degrees of familiarity 
with each of the training devices, tasks, and DEFT itself: 

Raters 1 and 2: Very familiar with DEFT, BOT; 

familiar with VIGS; v.ifamiliar v;ith 
MPS 

Raters 3 and 4: Unfamiliar with DEFT, BOT, and VIGS; 

very familiar with MPS 

Raters 5 and 6: Familiar with DEFT, BOT, and VIGS; 

unfamiliar with MPS. 

We planned to examine the impact of these differences 
on the various DEFT ratings and outputs. 
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Procedure. Packages of materials were prepared for 
each training device* The packages varied in the quality 
and quantity of information provided. Thus, the BOT- "pack- 
ages" consisted of a picture of the device, a brief en- 
gineering description, and the list of tasks and subtasks 
involved. The VIGS package was the actual device user's 
manual, complete with pictures, instructions for use, and 
capabilities of the device. The MPS package contained 
scores of pictures, descriptions, engineering specifica- 
tions, extracts from the Technical Manual used by actual 
crewmen on the E-3A aircraft, and the user's manual for 
MPS. 

Following the distribution of these packages to each 
of the raters. Raters 1-5 met to discuss ti4<= packages and 
to receive instruction on how to use DEFT. It was decided 
that the sparse information available regarding the BOT 
device would be inadequate for the purposes of this study. 
(Although in a "real-world" application, training device 
evaluators might be faced with similar problems — i.e., a 
lack of detailed information — our primary purpose was to 
determine interrater agreement. If each rater supplied his 
own set of assumptions regarding, e.g., training proficien- 
cy standards, differences in ratings could not be 
attributed to disagreements regarding DEFT.) Thus, the 
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raters were briefed as to the details of BOT, both as 
performed on the training devices (BOT and VIGS) and as 
performed on the M60 tank. In addition^ raters were 
briefed in detail on the E-3A and MPS configurations for 
the tasks under investigation. 

DEFT was presented and discussed at the "mechanical" 
level; that is^ raters were told how to operate the com- 
puter and how to proceed through the DEFT analyses. There 
was no discussion as to the meaning or interpretation of 
the various judgments and scales; we hoped that the infor- 
mation provided on the screen would be sufficient. 

Following this meetingr each rater was given a DEFT 
program diskette and a data diskette, containing the neces 
sary data bases. Each rater then processed each of the 
three training devices through all three levels of DEFT. 
Raters analyzed BOT firsts VIGS secondr and MPS thirdr com 
pleting all DEFT analyses on each device before analyzing 
the next device. 

At the completion of these analyses , the data disket-- 
tes were collected and the raw data scanned. A cursory ex 
amination of these data revealed that the information con- 
tained on the DEFT screens and the briefings held prior to 
the analyses were inadequate. Examination of the notes 
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each rater kept regarding his ratings indicated that each 
was operating under a different set of assumptions. These 
differences ranged from data entry conventions (e.g., if a 
Training Principle in the Acquisition Efficiency analyses 
of DEFT III was judged to be "not applicable," some raters 
entered "0," others entered "100," and others entered 
••999") to different assumptions regarding trainee charac- 
teristics (e.g., some raters thought the trainees for the 
MPS device were skilled maintenance crewmen, while others 
thought that they were naive crewmen, while others thought 
that they were naive graduates of a Technical School, with 
no aircraft experience). Thus, it was decided to reconvene 
the raters to discuss the devices and clarify assumptions. 
Following these discussions, changes in ratings were re- 
entered by the individual raters. Because of logistic con- 
straints. Raters 5 and 6 could not attend this meeting; 
therefore, their results were not included in further 
analyses . 

Results 

Output indexes. At each level of DEFT, seven output in- 
dexes are computed for a training device evaluation (al- 
though different numbers and types of ratings are involved 
in the different DEFT levels). These seven are: 
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1) Training Problem (TP) 

2) Acquisition Efficiency (AE) 

3) Acquisition (A) ; computed as TP/AE 

4) Transfer Problem (TRP) 

5) Transfer Efficiency (TE) 

6) Transfer (T) ; computed as TRP/TE 

7) Total Score; computed as A + T 

Theoretically r these indexes should be equivalent across 
all three levels of DEFT for a particular training device 
evaluation^ since the successively more detailed levels of 
DEFT are designed to be componential assessments of more 
global judgments. ThuSr the first question we will examine 
is whether raters were "internally consistent": For each 
index on each training device^ do the scores for the dif- 
ferent levels of DEFT agree? 

Relevant data are shown in Tables 25 - 27. Table 25 
shows obtained indexes for each rater on the BOT device for 
all levels of DEFT; Table 26 shows the same information for 
the VIGS device; and Table 27 shows the same information 
for the MPS device. Note that these data were obtained af- 
ter the second meeting of the ratersr where assumptions in- 
volved and interpretations of the scales were discussed. 
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TABLE 25. 



DEFT INDEX VALUES: BOT 









Rater 1 






Rater 2 






Rater 3 






Rater 4 








DEFT 1 


DEFT II 


DEFT III 


DEFT I 


DEFT II 


DEFT III 


DEFT I 


DEFT II 


DEFT III 


DEFT I 


DEFT 11 


DFFT I 




1 raining 




























Problem 


20.0 


37.5 


19.7 


26.0 


35.0 


17.0 


30.0 


40.0 


15.9 


25.0 


35.0 


17.7 




Acquisition 




























Efficiency 


0.80 


0.85 


0.67 


0.67 


0.81 


0.48 


0.70 


0.81 


0.46 


0.80 


0.76 


0.66 




Acquisition 


25.0 


44.1 


29.4 


38.8 


43.2 


35.4 


42.9 


49.4 


34.6 


31.5 


46.1 


26.8 




Transfer 




























Problem 


22.7 


55.0 


27.4 


21.0 


62.5 


33.1 


28.0 


57.5 


16.0 


26.2 


70.0 


33.8 




Transfer 


























cn 
en 


Efficiency 


0.25 


0.42 


0.30 


0.25 


0.50 


0.21 


0.40 


0.40 


0.33 


0.35 


0.45 


0.10 




Transfer 


91.0 


130.9 


91.3 


84.0 


125.0 


157.0 


70.0 


143.7 


50.9 


75.0 


155.6 


333.0 




TOTAL 


116.0 


175.0 


120.7 


122.8 


168.2 


192.4 


112,9 


• 193.1 


85.5 


106.5 


201.7 


359.8 
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TABLE 26. DEFT INDEX VALUES: VIGS 



Training 
Problem 

Acquisition 
Efficiency 

Acquisition 

Transfer 
Problem 

Transfer 
Efficiency 

Transfer 

TOTAL 





Rater 1 






Rater 2 






Rater o 










DEFT 1 


DEFT II 


DEFT III 


DEFT I 


DEFT II 


DEFT III 


DEFT I 


^ DEFT 1 1 


DEFT III 


DEFT I 


DEFT II 


DEFT II 


17 5 


42.5 


22.9 


24.0 


50.0 


23.5 


25.0 


55.0 


19.0 


20.0 


45.0 


20.8 


0.90 


0.78 


0.70 


0.85 


0.75 


0.60 


0.80 


0.83 


0.42 


0.80 


0.88 


0.79 


19.4 


54.5 


32.7 


28.2 


66.7 


39.2 


31.2 


66.3 


45.2 


25.0 


51.1 


26.3 


6.0 


35.0 


10.4 


6.6 


29.0 


9.2 


10.0 


47.5 


10.0 


15.0 


50.0 


18.2 


0.50 


0.77 


0.51 


0.70 


0.92 


0.54 


0.60 


0.77 


0.46 


0.50 


0.85 


0.39 


12.0 


45.5 


20.4 


9.4 


31.5 


17.0 


16.7 


61.7 


21.7 


30.0 


58.8 


46.7 


31.4 


100.0 


53.1 


37.6 


98.2 


56.2 


47.9 


128.0 


66.9 


55.0 


109.9 


73.0 
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TABLE 27. 



DCFT INDEX VALUES: MPS 







Rater 1 






Rater 2 






Rater 3 






Rater 4 






DEFT 1 


DEFT II 


DEFT III 


DEFT I 


DEFT II 


DEFT III 


DEFT I 


DEFT II 


DEFT III 


DEFT I 


DEFT II 


DEFT I 


Training 


























Problem 


18.0 


25.0 


5.6 


16.7 


30.0 


6.3 


24.0 


40.0 


6.7 


16.0 


22.5 


9.5 


Acquisition 


























Efficiency 


0.80 


0.80 


0.89 


0.60 


0.76 


0.47 


0.70 


0.80 


0.61 


0.55 


0.83 


0.81 


Acquisition 


22.5 


31.2 


6.3 


27.9 


39.5 


13.4 


34.3 


50.0 


n.i 


29.1 


27.1 


11.7 


Transfer 


























Problem 


9.0 


20.0 


7.7 


6.0 


35.0 


7.2 


14.0 


30.0 


15.0 


6.0 


27.5 


7.3 


Transfer 


























Efficiency 


0.40 


0.43 


0.50 


0.50 


0.46 


0.24 


0.60 


0.57 


0.40 


0.65 


0.47 


0.38 


Transfer 


22.5 


46.5 


15.5 


12.0 


76.1 


30.2 


23.3 


52.6 


37.5 


9.2 


58.5 


19.2 


TOTAL 


45.0 


77.7 


21.8 


39.9 


115.6 


43.6 


57.6 


102.6 


48.6 


38.3 


85.6 


30.9 
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The logical question to ask first is what an "accep- 
table" level of internal consistency would be. How close 
to one another should we desire that these indexes be? 
This is an arbitrary decision; however, considering the 
results of the Monte Carlo analyses discussed in previous 
sections, it is clear that the data shown in these tables 
for DEFT I and DEFT III are internally consistent. Of the 
84 pairs (3 devices x 4 raters x 7 indexes) of DEFT I and 
DEFT III indexes, 70 (83.3%) are within 20 points of each 
other, and about half are within 10 points of each other. 
Furthermore, most of the large disagreements are due to 
arithmetic combinations of smaller disagreements. For ex- 
ample, consider Rater 2, BOT: 



DEFT I DEFT III 

TRP 21.0 33.1 

TE 0.25 0.21 

T 84.0 157.0 

Total Score 122.8 192.4 



The relatively small difference in TRP is magnified by the 
very small difference in TE to produce large differences in 
T and Total Score. This also may have been anticipated 
from the Monte Carlo sensitivity analyses: small 
differences in the Efficiency indexes will have large 
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effects on summary indexes. If these cumulative 
differences are taken into account, it appears that DEFT I 
and DEFT III indexes are internally consistent. 

On the other hand, DEFT II indexes are substantially 
higher than either DEFT I O;: DEFT III in practicalli- all 
cases. A closer examination of the data reveals that the 
problems seem to be with the TP and TRP indexes (the 
Training and Transfer Problems, respectively) . Each is ap- 
proximately twice as large for DEFT II than for the others. 

This anomaly can be explained by examining how these 
indexes are derived for DEFT II as compared to DEFT I and 
DEFT III. In both of the latter cases, TP and TRP are mul- 
tiplicative functions of two ratings: Performance Deficit 
and Performance Difficulty. Thus, in DEFT I, if a training 
device objective is judged to contain 50% skills and 
knowledge not possessed by traineer. and these skills and 
knowledge are judged to be moderately difficult to learn — 
e.g., they are rated at "50" on the Performance Difficulty 
scale — the TP score will be (50 x 50) /I > = 25. However, 
in DEFT II, the judgment made as to the Performance Deficit 
is a sim.ple "yes" or "no" (can do or can't do) for each 
task contained in the training objective. Thus, the 
multiplicative combination of deficit and difficulty is not 
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contained in DEFT II. In fact, when the DEFT II indexes 
are modified by encorporating either DEFT I or DEFT III 
Performance Deficit ratings, the DEFT II indexes dovetail 
precisely with the other indexes. (These recalculated in- 
dexes are not shown.) 

The other relatively minor inconsistencies in these 
data are in the Efficiency indexes (AE and TE) of DEFT III. 
In most cases (19 out of 24) , the DEFT III Efficiency in- 
dexes are the lowest of the three (although in most cases 
these differences are quite small). In post-rating discus- 
sions, the raters felt that this was partially due to an 
"oversegmentation" problem: many of the eleven Training 
Efficiency and eight Transfer Efficiency principles 
received qrite low ratings when applied to subtasks. For 
example, augmenting feedback for a relatively trivial sub- 
task such as "Indexes ammunition" would quite reasonably 
not be included as an instructional feature of the VIGS 
device; nevertheless, VIGS was "penalized" with a low 
Efficiency rating for this principle. 

Part of this problem is a terminological artifact of 
the particular tasks and subtasks selected for this study. 
While we termed "Indexes ammunition" a subtask, in standard 
task analyses it would probably be considered a "step" or a 
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"behavioral element." The resolution of the Efficiency 
index problem will involve either "tightening up" DEFT 
input requirements (e.g*, by specifying task-analytic 
procedures and definitions for determining "tasks" and 
"subtasks"), or by conducting DEFT III Efficiency analyses 
at the task level. 

The next question that can be addressed by the ex- 
amination of these data is interrater agreement within and 
across devices for .these indexes. Thus, for example, do 
raters agree on the TP value for VIGS? Again, the question 
as to what would constitute "agreement" must be arbitrarily 
answered. Standard correlational techniques are not 
meaningfully interpretable with small sample sizes. Thus, 
we will examine interrater agreement descriptively. 

When one closely examines Tables 25 - 27, one can only 
be impressed by the equivalence of the indexes across 
raters for all three training devices, with the exception 
of the Total Score and an occasional "deviant" point, all 
indexes are within a few point of one another. Considering 
the range of values that these indexes can take and the ex- 
pected magnitude of difference scores as demonstrated by 
the Monte Carlo analyses, this correspondence is excellent. 
If the 100-point scales were converted to discrete 5- or 



61 

82 



7-point scales, interrater agreement would be almost 
perfect. 

Again, we must note that these data were obtained fol- 
lowing a discussion among the raters; this discussion un- 
doubtedly pulled the ratings closer together. (Countering 
this, however, is that discussions were of the rating 
scales, not of the summary indexes.) The picture of inter- 
rater agreement prior to the discussion, while still quite 
good, was not quite so rosy. As was mentioned previously, 
differing interpretations and rating conventions (par- 
ticularly with respect to scoring rules for the Efficiency 
scales) resulted in many index values that were not compar- 
able. For example, when a Training Principle was judged as 
"not applicable," some racers scored the scale as "zero," 
others as "100," and others as "999." Clearly, it would 
not make sense to compare indexes derived for these dif- 
ferent raters. 

The major discrepancy in these comparisons is the dis- 
agreements in che Total Scores. Paralleling the above dis- 
cussions, we attribute these differences to the cumulative 
effects of smaller differences in individual component in- 
dexes; furthermore, many of the Total Score differences can 
be traced to the large impacts of the Efficiency indexes. 
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One possible solution^ as suggested by the Monte Carlo 
analyses^ is to transform the Efficiency indexes (e.g.^ by 
using a square root). While this reduces the problem^ it 
does not eliminate it; however, this manipulat ion, plus the 
adoption of the suggestion to conduct DEFT III Efficiency 
analyses at the task (rather than the subtask) level, would 
produce significant convergence in Total Scores. 

In summary, these data indicate substantial interrater 
agreement for all DEFT indexes and across the three 
devices* This is even more encouraging when one considers 
first that the raters had different degrees of familiarity 
^*;ith DEFT and the three devices, and second that the three 
devices were of quite different sorts. The next issue to 
examine is whether these levels of interrater agreements 
are maintained when the individual scales are examined. 

Individual scales. Table 28 shows the average pairwise 
agreements among the four raters for each of the eight 
DEFT I scales. These figures were computed by taking the 
absolute differences between each pair of raters on each 
scale judgment, adding them, and calculating a mean and 
standard deviation. Since all raters rated all dimensions, 
there were six differences that were combined for each 
entry in the table. In addition, row and column means of 
these mean differences are shown. 
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TABLE 28. MEANS AND STANDARD DEVIATIONS OF PAIRED RATER COMPARISONS 
FOR EACH TRAINING DEVICE - DEFTi 

Question 

PD LD TA RD RLD PS FS H 



BOT X n.67 0.0 8.17 5.00 5.00 5.33 9.17 9.17 6.69 

6 (6.83) (0.0) (4.95) (2.89) (2.89) (2.63) (5.34) (5.34) (1.80) 



E3A X 12.17 5.83 14.17 5.00 9.17 10.00 12.50 14.17 10.38 

6 (7.06) (3.44) (6.72) (5.00) (5.34) (7.07) (9.47) (6.72) (3.02) 



VIGS / 9.17 .10.00 5.83 5.00 12.83 13.33 10.00 11 .67 9.73 

6 (5.34) (5.77) (3.44) (5.00) (8.15) (6.24) (7.07) (6.87) (2.58) 



GRAND X 

X 11.0 5.28 9.39 5.00 9.00 9.56 10.56 11.67 8.93 
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As could be surmised from the discussion .bove con- 
cerning the output indexes^ interrater agreement for each 
of the underlying scales was also quite substantial. 
Overall^ the average disagreement was approximately 9 
points (on a hundred-point scale), well within what could 
be considered acceptable levels of agreement. For the in- 
dividual scales, the average disagreement was between 5.0 
and 11.67 points, with no particular scale having an un- 
usually high level of disagreement. Likewise, the three 
devices all showed equivalent levels of agreement. 

Tabl€;s 29 and 30 show the equivalent data for DEFT 11 
and DEFT III. Again, with minor discrepancies, interrater 
agreement was .high for all scales for the DEFT models on 
all three devices. The conclusions to draw from these 
tables are the same as were made above for the summary in- 
dexes: Interrater agreement for DEFT is encouragingly high, 
especially given differences among raters with respect to 
familiarity with DEFT and the three devices; and the level 
of interrater agreement demonstrated would support the con- 
tinued development and use of DEFT for the evaluation of 
training-device-based training systems. 
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TABLE 29. MEANS AND STANDARD DEVIATIONS OF PAIRED RATER COMPARISONS 
FOR EACH TRAINING DEVICE - DEFT II 









LD 


RD 


Question 
RLD 


PS 


FS 


Mean 


Device 


















BOT Task! 


X 

5 


0.0 
(0.0) 


10.83 
(5.34) 


0.0 
(0.0) 


11.67 
(6.87) 


10.83 
(5.34) 


7.50 
(4.79) 


6.81 


Task2 


X 

5 


0.0 
(0.0) 


5.0 
(5.0) 


0.0 
(0.0) 


5.0 

(2.89) 


5.0 
(5.0) 


13.33 
(9.43) 


4.72 


E3A Task! 


X 

6 


0.0 
(0.0) 


8.33 
(5.53) 


0.0 
(0.0) 


2.50 
(2.50) 


10.0 
(7.07) 


10.0 
(5.77) 


5.14 


Task2 


X 

5 


0.0 
(0.0) 


12.50 
(7.50) 


0.0 
(0.0) 


10.0 
(7.07) 


11.67 
(6.87) 


18.33 
(10.67) 


8.75 


VIGS Task! 


X 

6 


0.0 

Co.o) 


9.33 
(5.31) 


0.0 
(0.0) 


15.0 
(8.66) 


11.67 
(6.87) 


16.67 
(7. -) 


8.78 


Task2 


X 

6 


0.0 
(0.0) 


10.17 
(5.87) 


0.0 
(0.0) 


10.17 
(5.27) 


12.50 
(7.50) 


15.00 
(7.64) 


7.97 


X 

6 




0.0 
(0.0) 


9.36 
(2.56) 


0.0 
(0.0) 


9.06 
(4.55) 


10.28 

(2-72) 


13.47 
(4.10) 


7.03 



Device 

BOT X 

6 

E3A X 

5 

VIGS X 

5 



Acquisition 
Efficienc y 



2.50 
(2.5) 

3.50 
(2.18) 

7.08 
(3.36) 



Transfer 
Efficiency 



5.17 
(2.99) 

6.83 
(4.76) 

8.83 
(5.15) 



3.84 
5.1^0 
7.95 
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TABLE 30. MEANS AND STANDARD DEVIATIONS* OF PAIRED RATER COMPARISONS 
FOR EACH TRAINING DEVICE - DEFT III 



Question 







PD 




TA 


RD 


RLD 


TT 


Burst on Target 
Task 1 
















Index 

Ammunition 


X 

a 


0.00 
(0.00) 


0.00 
(0.00) 


24.64 
(12.89) 


- 


0.00 


23.75 


Turn on Main 
Gun Switch 


x" 

0 


0.50 
(0.50) 


0.00 
(0.00) 


26.36 
( 9.59) 


- 


- 


- 


Announce 
Identified 


7 
a 


0.00 
(0.00) 


0.00 
(0.00) 


22.44 
(12.04) 


- 


- 


- 


Apply Lead 
(Simul atsd) 


x" 

0 


0.67 
(0.47) 


0.17 
(0 17) 


11.80 
I 9 11) 


0.00 


0.33 


13.92 


Lay Crosshair 

LcaQ i 1 Mc 


X 
0 


0.00 
\ u . UU ) 


0.00 
^ U . Uu; 


11.44 

[ y.o4; 


- 


0.00 

/ rt rtrt \ 

VO.OO) 


16.94 
( 9.62) 


Fire Main 
Gun 


X 

a 


0.67 
(0.47) 


0.00 


53.46 


- 






Task 2 
















.>ense 
Round 


x" 

0 


0.00 
(0.00) 


0.00 
(0.00) 


8.05 
4.84 


0.00 


0.33 
(0 24) 


12.21 
( 6.87) 


Announce 
Sensing & "BOT" 


7 
a 


0.67 
(0.47) 


0.00 
(0.00) 


9.73 
( 7.03) 




0.00 
(0.00) 


18.67 
( 6.60) 


Relay to New 
Aiming Point 


X 
0 


1 .00 
(0.58) 


0.00 
(0.00) 


6.99 
(4.46) 


- 


0.00 
(0.00) 


6.44 
( 3.82) 


Fire Main 
Run 


X 

rt 
\J 


0.67 

fn 47^ 


0.00 


48.91 


- 






E3A 
Task 1 
















Ensure Power & 
Cooling Avail . 


x" 

a 


1.33 
(0.94) 


0.00 
(0.00) 


28.11 
(13.90) 








Turn on NCS 
Power on 


X 
0 


1 ,83 

(1.0-/) 


0.00 
(0.00) 


28.23 
(13.77) 




0.17 
(0.10) 


24.94 
(11.68) 



* standard deviations are provided when more than two raters supolied 
a rating. 
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Table 30 (Continued) 





PD 




TA 


RD RLD 


II 


Turn Autopilot x 
Off a 


2.33 

f1 37 ^ 


0.00 


31.55 


0.17 


26.25 


Turn Probe 7 
Heaters Off a 


2.33 
(1.37) 


0.00 
(0.00) 


31.55 


0.17 
No. D,G 


27.50 
No. D,R 


Synchronize 7 
Horizontal a 
Situation TnHirator^ 


1.83 
(1.07) 


0.00 
(0.00) 


4.18 
(1.52) 


0.11 
(0.08) 


22.50 
( 8.81) 


INS-1 u INS-2 X 
to Align Mode a 


1.83 
(1.07) 


0.00 
(0.00) 


14.97 
( 6.24) 


o.n 

(0.00) 


25.00 
(11.90) 


Test UDC 7 
Display & Lamps a 


1.83 
(1.07) 


0.17 
(0.17) 


14.85 
( 5.87) 


0.11 
(0.08) 
No. G 


24.17 
( 9.80) 
No. G 


Detect 7 
Fault 10 0 


2.17 
(1.57) 


0.00 
(0.00) 


37.33 
(13.56) 






Task 2 

CDUs 7 

0 


2.00 
(1.41) 


0.00 

(0.00) 


24.97 
(12.23) 


0.11 
(0.08) 


15.08 
( 7.07) 


Sim. Restart, 7 
Perform a 


2.50 
(1.50) 


0.00 


39.73 


0.33 
(0.24) 


19.90 
(16.07) 


INUs 7 

0 


2.00 
(1.41) 


0.00 
(0.00) 


24.97 
(12.23) 


0.00 
(0.00) 


15.08 
( 7.07) 


Sim. Restart, 7 
Perform a 

rhprif nil t 


2.50 
(1.50) 


0.00 


48.73 


0.36 
(0.26) 


19.90 
(16.07) 


Check 115 VAC 7 
Power a 


1.50 
(1.50) 


0.17 
(0.17) 


22.70 
(11.59) 


0.78 
(0.44) 


9.25 
( 4.81) 


Sim. Continuity 7 
Check, Check a 
Wiring Continuity 


2.00 
(1.41) 


0.22 
(0.16) 


27.64 
(10.48) 


0.33 
VO.14) 


13.00 
( 5.70) 


Sim. Replace. x 0.50 
of Capacitor, a (0.50) 
Replace Shorted Capacitor 


0.00 
(0.00) 


29.27 
(10.72) 


0.00 


19.50 
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Table 30 (Continued) 







PD 


LD 


TA 


Rn R) n 


TT 


Sim. Restart, 


X 


2.50 


0.00 


48.73 


0.33 


25.17 


Perforin 


a 


(1.50) 






(0.24) 


(17.80) 


Checkout 










VIGS 














Task 1 














Index 


X 


0.50 










Ammunition 


CT 


(0.50) 










Turn on Main 


X 


0.50 










Run Switch 


o 


(0.50) 










Announce 


'x 


0.67 


0 00 

\J • \J\J 


27 36 






IDENTIFIED 


o 


(0.47) 










Add! v 


Y 
A 




n no 


1 0 • 00 


U.oo 


iU.Oo 


Lead 


a 


(0.00) 


(0.00) 


( 8.64) 


(0.24) 


( 4.83) 


Lay Crosshair 




0.50 


0.00 


11.77 


0.00 


6.98 


Leadline 


o 


(0.50) 


(0.00) 


( 5.41) 


(0.00) 


(4.72) 


Fire Main 


X 


0.50 










Gun 


a 


(0.50^ 










Task 2 














Sense 




0.67 


0.00 


18.76 


0.00 


10.46 


Round 


(J 


(0.47) 


(0.00) 


flO 28) 


(0 no) 




Announce 




0.00 


0.00 


24.86 


0.00 


14.67 


Sensing & "BOT" 


(T 
\j 


fO 00) 


(0 00) 


\ 1 1 . 0^ J 


\ u . UU ) 


\ 0 . JO ) 


Relay to New 


x" 


1.17 


0.22 


18.44 


0.00 


12.08 


Aiming Point 


a 


(0.69) 


(0.16) 


(10.70) 


(0.00) 


( 5.76) 


Fire Main 


X 


0.50 










Gun 


a 


(0.50) 











Summary 

Based on the analyses presented in this report^ a num- 
ber of reconunendations can be made regarding modifications 
of DSFT: 

1. The expected distribution of summary index scores 
is too large to provide for meaningful interpretations of 
DEFT output^ unless various assumptions are made regarding 
the expected distributions of input variables in the real 
world. All of the assumptions we made are defensible 
(e.g.f a training device will not be built that addresses 
no performance deficit , etc.); however, a different set of 
assumptions would result in different critical values for 
inter-device comparisons. 

2. The major contributors to output variance are the 
two Efficiency scales. To reduce this problem, it is 
recommended that some transform (e.g., square root) be 
used. 

3. It is recommended that two additional scales be 
added to the DEFT II analyses. These scales would assess 
the proportion of required skills and knowledge contained 
in the training device requirement and the operational 
performance objective that the trainees do not possess. 
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4. It is imperative that when more than one rater 
applies DEFT to the evaluation of a device, the raters 
agree on their assumptions regarding the device, trainee 
population, device utilization, and the meanings of the 
various DEFT scales prior to conducting analyses. 

Based on these results, recommendations 2 and 3 abov 
have been implemented in the most recent DEFT programs. 
Presumably, the remaining recommendations would be imple- 
mented by DEFT users. 
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